1 Dataset Description

A total of 26,890 individuals from 8,698 families were genotyped on the GSA-24v1-0_A1.

  • 15,138 males, 11,752 females.
  • Individuals in a single family ranged from 2 to 10!
  • 634,709 SNPs were included in the genotype files.
  • Note that coordinates were based on Build38.


2 Raw Genotype QC

2.1 Sex Check

  • Based on 6,995 QCd (--geno 0.05 --maf 0.01 --hwe 1e-6 --mind 0.1) CHR-X SNPs.
  • 147 PRROBLEM
    • 130 ambiguous SNPSEX (close to 0.2/0.8)
    • 17 with SNPSEX different from PEDSEX (Need further explorations)

2.1.1 Mismatch summary



2.1.2 ChrX F distributions



2.2 Pariwise IBD estimation

  • Relationships (RT): OT (Others), FS (Full Siblings), HS (Half Siblings), PO (Parent Offspring)
  • IBS sharing for other pairs, ranging
    • from 0.20 to 1.00 in FS,
    • from 0.45 to 0.64 in PO,
    • from 0.00 to 0.55 in OT
    • indicating inbreeding between some parents and possible relatives between families as multiplex included.


2.2.1 Estimated pairwise IBD distributions



2.3 Individual genome-wide heterozygosity

2.3.1 Genome-wide heterozygosity VS missing rates



2.3.2 Genome-wide F VS missing rates



3 Imputation

3.1 Pre-imputation

The imputation pipeline follows that used for SSC dataset. A total of 26867 individuals and ~400K autosomal, ~7K chrX SNPs were used for further impution.

  • filters: --geno 0.05 --mind 0.1 --maf 0.01 --hwe 1e-6
    • 23 people removed due to missing genotype data (–mind)..
    • Total genotyping rate in remaining samples is 0.981795.
    • 41406 variants removed due to missing genotype data (–geno).
    • 62974 variants removed due to Hardy-Weinberg exact test.


3.2 After Imputation

3.2.1 Frequency distribution



3.2.2 PCA